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Abstract 

Background: Protein loops are flexible structures that are intimately tied to function, but understanding loop 
motion and generating loop conformation ensembles remain significant computational challenges. Discrete search 
techniques scale poorly to large loops, optimization and molecular dynamics techniques are prone to local minima, 
and inverse kinematics techniques can only incorporate structural preferences in adhoc fashion. This paper presents 
Sub-Loop Inverse Kinematics Monte Carlo (SLIKMC), a new Markov chain Monte Carlo algorithm for generating 
conformations of closed loops according to experimentally available, heterogeneous structural preferences. 

Results: Our simulation experiments demonstrate that the method computes high-scoring conformations of large 
loops (>10 residues) orders of magnitude faster than standard Monte Carlo and discrete search techniques. Two 
new developments contribute to the scalability of the new method. First, structural preferences are specified via a 
probabilistic graphical model (PGM) that links conformation variables, spatial variables (e.g., atom positions), 
constraints and prior information in a unified framework. The method uses a sparse PGM that exploits locality of 
interactions between atoms and residues. Second, a novel method for sampling sub-loops is developed to 
generate statistically unbiased samples of probability densities restricted by loop-closure constraints. 

Conclusion: Numerical experiments confirm that SLIKMC generates conformation ensembles that are statistically 
consistent with specified structural preferences. Protein conformations with 100+ residues are sampled on standard 
PC hardware in seconds. Application to proteins involved in ion-binding demonstrate its potential as a tool for 
loop ensemble generation and missing structure completion. 



Background 

Sampling conformations of kinematic chains - rigid 
objects connected by articulated joints - is a fundamental 
problem in protein structure prediction, the geometry of 
folding linkages, and robot motion planning. Sampling 
poses a challenging computational problem when chains 
are large and must satisfy a variety of constraints and sta- 
tistical preferences. Conformations may be required to 
satisfy hard feasibility constraints, such as loop closure and 
collision avoidance, while also obeying soft preference 
constraints, such as low energy and high structural likeli- 
hood. Particularly around folded protein structures, the 
subset of feasible and favorable conformations comprises a 
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miniscule fraction of the conformation space, and due to 
the "curse of dimensionality" this fraction shrinks dramati- 
cally with the dimensionality of the state space. Because 
interesting biological macromolecules have large numbers 
of degrees of freedom, ranging up to hundreds or thou- 
sands, new techniques are needed to sample severely con- 
strained conformations efficiently. 

Protein loops are flexible structures that often deform 
during binding, and are extremely important for under- 
standing protein functioning [1]. Loop sampling has been 
used in missing fragment reconstruction, generating fluc- 
tuations in equilibrium conformations, and generating 
decoy sets for function prediction. Such applications typi- 
cally require methods for sampling energetically-likely and 
diverse configuration ensembles rather than optimizing a 
single point estimate. The loop closure constraint, which 
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requires the terminal atoms of a loop to lie at fixed posi- 
tions dictated by the surrounding structured regions, 
poses a major challenge in sampling. Existing loop sam- 
pling methods include discrete search [2], optimization 
[1], and inverse kinematics (IK) methods [3-6]. Fiser et al's 
[1] approach takes an energy function that encodes spatial 
restraints and preferences on dihedral angles, and then 
runs a numerical optimization to minimize that energy. 
Optimization is relatively computationally expensive, 
which is usually worthwhile for single structure prediction 
but less so for generating conformation ensembles. 
Discrete search methods are able to explore a wider space 
of conformations by incrementally building a tree of clash- 
free subchain conformations starting from one end of the 
loop and progressing toward the terminal end [2,7,8]. 
Although effective for small loops, these methods do not 
scale well to large loops due to the problem of combina- 
torial explosion. As a result, discrete search is intractable 
with chains containing 7-8 or more residues. They also 
introduce discretization artifacts, and are not able to close 
the gap between a terminal atom and its desired position. 
Inverse kinematics (IK) techniques from the robotics field 
have been adopted to sample conformations with exact 
loop closure [3-5,9]. However, these methods prioritize 
the loop closure constraint and do not take energies into 
account during sampling. Hence, some authors employ a 
secondary energy optimization step to generate more plau- 
sible conformations [3,6,10]. 

For each of these methods, the sampling distribution is 
throughly entangled with the sampling procedure. To 
achieve a desired distribution, the sampling procedure 
must be tuned in an opaque, nontrivial manner, and it is 
unclear that a desired distribution can even be achieved. 
Instead, extensive post-hoc empirical testing to assess the 
quality of the resulting distribution and to argue that a 
method samples well. Monte Carlo (MC) techniques 
represent a more principled class of approaches that sam- 
ple directly from a desired destribution. They have a long 
history of use in computational biology because they can 
quickly explore multiple energy minima and transition 
pathways, while molecular dynamics and optimization 
techniques often get stuck in single local minima [11-14]. 
They are also well-suited for generating conformation 
ensembles. The general Metropolis-Hastings approach 
generates a sequence of incrementally perturbed config- 
urations via a random walk, with a carefully-designed 
acceptance criterion (the detailed balance condition) that 
ensures that the sampling distribution approaches the 
desired one as more samples are drawn. However, there is 
a tradeoff in choosing the perturbation size: small pertur- 
bations raise the fraction of accepted moves but lower the 
speed of conformation space exploration. Moreover, stan- 
dard MC techniques cannot be directly applied to protein 



loops due to the loop closure constraint, which causes 
each step to be accepted with probability 0. 

Our new method overcomes many of the weaknesses of 
prior methods (see Table 1); it simultaneously scales to 
long loops (e.g., >10 residues) and produces unbiased 
ensembles of conformations. It uses a unified probabilistic 
graphical modeling (PGMs) framework for modeling a 
desired distribution from experimentally available statisti- 
cal priors. PGMs such as Bayesian networks and Markov 
random fields are powerful tools for inference in large 
domains with heterogeneous sources of information, and 
have been applied in a limited sense to protein structure 
prediction. They have been used to predict side-chain 
rotamer conformations [15] and conformations of macro- 
molecular assemblies from electron density maps [16]. 
Their use is reasonably well understood in the discrete 
case, but continuous variables often prove challenging. 
Our work derives a new Gibbs sampling inference method 
for continuous PGMs with loop-closure constraints, which 
restrict the feasible domain to a nonlinear implicit mani- 
fold. In particular we derive the mathematical relationship 
between an inverse kinematic sampling distribution and 
the manifold's metric tensor, which is necessary to com- 
pute the detailed balance condition in the Metropolis- 
Hastings algorithm. The resulting sampling sequence is 
unbiased in the sense that its distribution approaches the 
target distribution in the large-sample limit. 

Methods 

SLIKMC is a Markov chain Monte Carlo (MCMC) 
method that takes as input an experimental conformation 
scoring function <t>, a protein structure from the Protein 
Data Bank (PDB), the beginning and ending residues of 
the loop, and outputs a sequence of perturbed loop con- 
formations such that the sequence asymptotically 
approaches a probability distribution proportional to O. If 
the structure is missing, a rough initial structure is 
sampled using existing inverse kinematics loop closure 
techniques. To generate a subsequent conformation, it 
performs the following operations: 
For each 4-residue subloop, repeat the following steps: 

1. Sample a new subloop conformation that satisfies 
kinematic constraints. 

2. Compute the Metropolis-Hastings importance 
ratio a of the new conformation against the previous 
conformation. 

3. Accept or reject the new subloop conformation 
with probability a. 

The method terminates when a fixed number of con- 
formations are generated or until a desired time cutoff 
is reached. The novel contributions of this paper include 
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Table 1 Characteristics of loop generation techniques 


Technique 


Loop closure 


Prior distribution/energy function 


Global search 


Scalability 


Optimization 


Exact 


Y 


N 


+ 


nverse kinematics sampling 


Exact 


N 


Y 


++ 


Discrete search 


Inexact 


Y 


Finite subset 




Standard Monte Carlo 


No 


Y 


Y, reqs. mixing 


+ 


SLIKMC 


Exact 


Y 


Y, reqs. mixing 


++ 



an exact derivation of the importance ratio a for the 
inverse kinematics sampler of step 1 and the use of 
sparse PGMs to evaluate the importance ratio quickly 
per-subloop. We also describe extensions that handle 
flexibility in side-chains and molecules with multiple 
branches or loops (e.g., polycyclic compounds). 

As a MCMC method, SLIKMC samples from a com- 
plex joint probability distribution by constructing a 
Markov chain whose equilibrium distribution is equal to 
the desired distribution. It is a hybrid MCMC algorithm 
that combines blocked Gibbs sampling and Metropolis- 
Hastings (M-H) sampling. M-H permits the use of non- 
normalized probability distributions, which is important 
because it is relatively simple to define a useful scoring 
function but virtually impossible to ensure that it inte- 
grates to one. The blocked Gibbs sampling method sam- 
ples a small subloop at each step, which helps SLIKMC 
scale better to large chains, because acceptance rates 
decrease roughly exponentially in the number of vari- 
ables sampled at once. This section will first review clas- 
sical MCMC methods and then describe the new 
approach. 

Markov chain Monte Carlo framework 

Let x = (xi, x„) denote the state variables of the sys- 
tem of interest. Experimental conditions including hard 
constraints and soft preferences are encoded into a non- 
negative scoring function (t>(x l , x„). The score is 
required to have finite integral, is zero at states that vio- 
late hard constraints, and higher values indicate more 
desirable states. <t> is considered as an unnormalized 
probability density, and our goal is to generate samples 
with probability proportional to O. In other words, the 
goal is to sample from the normalized density P defined 
as: 



1 



P(Xi, ...,x„) = -<&(xi, ...,x n ) 



(1) 



where the proportionality constant Z that ensures that 
P integrates to 1. O is closely related to energy functions 
£(x) through the Gibbs measure 



O(x) = exp(-£(x)/T) 
where T is the system temperature. 



(2) 



The Metropolis-Hastings (M-H) algorithm addresses 
the problem that it is hard to sample directly from an 
unnormalized distribution <t> in part due to the difficulty 
of evaluating the normalization term Z [17]. On step k, 
M-H first samples a candidate move from x w to x' 
from a proposal distribution Q(x'; x^'), and accepts the 



a = min I 1 



x' with probability 

P(x')Q(xM;x') 
P(xC'))Q(x';xM) 



(3) 



This is the so-called detailed balance condition. The Z 
terms in the numerator and denominator cancel out, so 
we use <t> directly instead of P. If the move is rejected, 
then the current state is maintained: x ( * +1) <— x w . The 
term 



Ofx')Q(xW;x') 
4>(x( fe ))Q(x';x( fe )) 



(4) 



is called the importance ratio. If the ratio is greater 
than 1, then the new sample is accepted; otherwise it is 
accepted with probability equal to the ratio. With the 
detailed balance construction, P (x) is indeed the sta- 
tionary distribution of the Markov chain generated by 
successive samples. 

The key question for M-H is how to choose a propo- 
sal distribution that we can sample from and evaluate. 
The acceptance strategy must evaluate the terms in (3) 
exactly so that the M-H algorithm respects the detailed 
balance. One of our key contributions is a technique for 
evaluating Q exactly when sampling from closed chain 
submanifolds, which enables our method to generate an 
unbiased sampling sequence. 

Note that it is challenging to choose Q to approximate P 
closely, and hence in practice the probability of accepting 
a sample drops sharply in the dimensionality of the space. 
Gibbs sampling is commonly used to address this issue. 
Moreover it is convenient when sampling from a condi- 
tional density is easier than sampling from the entire joint 
density, which is the case in the sparse Bayesian models 
that we employ in this work. Given the current sample 
x' fc ' = [xf\ at time k, Gibbs sampling generates 
the next state x (<r+1) by sampling a single variable x| fe+1 ^ 
from the conditional density 
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P(.f +1) |xf +1) *|^,x« (5) 

and keeping the remaining variables fixed. The vari- 
able is updated and the index i is incremented in loop- 
ing fashion. If the dependencies between the variables 
are sparse (e.g., every variable x t only depends on a 
handful of variables rather than the remaining n - 1 
variables), then Gibbs sampling can be efficient even for 
very large problems. This is exploited in sparse PGMs. 
Blocked Gibbs sampling is a variation of Gibbs sampling 
that groups multiple variables as a block and samples 
the block from the joint distribution conditioned on all 
other variables. 

Our method combines Gibbs sampling with M-H sam- 
pling to generate a new sample from (5). To do so, simply 
consider all other variables fixed, sample x| from a condi- 
tional proposal distribution Q(x-; 

and then apply the importance ratio test as usual to deter- 
mine whether to accept the step x^ fe+1 ' -e- x' or keep 

x [k+1) = x [k) - 

Sparse factored models 

Due to the locality of interactions in most scoring func- 
tions of interest, it is possible to represent O in a 
factored form: 

0(xi,...,x„) = ]~[</>i(Si) ( 6 ) 

i 

where each <p, is known as a factor and each S, is a sub- 
set of {xi, x n ) known as the domain of the factor cp,-. 
For example, in protein structure prediction factors may 
include Ramachandran plots relating each pair of dihe- 
dral angles (</>, y/), steric clashes, energy functions defined 
over atom positions, and prior knowledge from B-factors 
or electron density maps. 

Probabilistic graphical models like Bayesian networks 
and Markov random fields are inherently factored: the 
domain of each factor consists only of a vertex and its 
neighbors in the graph. A graphical model is sparse if each 
variable x t is involved in only a handful of factors (i.e., 
bounded by a constant unrelated to n), and hence only 
interacts directly with a few other variables. An important 
consequence in the discrete case is that probabilistic infer- 
ence is computationally tractable in sparse models (poly- 
nomial in «), whereas inference is intractable in dense 
models (in general, exponential in n). A key step in our 
method converts the representation of a kinematic chain 
from dense to sparse form, as described below. The imple- 
mentation described in this paper currently supports: 

♦ Ramachandran plots (pRP( r )(<f>, y/) which vary by 
residue r. 

♦ Steric clashes <Psc(j, k)(Pj> Pk) which are 0 if atom / 
collides with atom k and 1 otherwise. 



♦ B-factors defined as Gaussians 

<t>BF ^ = 7^w j exp ~( ]]2 0^) where f*J is the 
predicted atom position and Bj is the B-factor value 
in the protein's PDB file. A constant of proportional- 
ity c can be set by the user according to his/her con- 
fidence in the quality of the B-factor estimates. 

• Side-chain rotamer distributions, as described the 
Side Chain Sampling section. 

Each factor can be evaluated quickly, but over thou- 
sands or millions of evaluations they accumulate signif- 
icant computational cost. Significant savings can be 
achieved in sparse models, because when a few vari- 
ables are changed, the change in O can be calculated 
quickly by only evaluating those factors involved, rather 
than recomputing <J> from scratch. Although steric 
clashes are theoretically considered as 0(n ) pairwise 
factors, in practice we use a grid-based hashing data 
structure that only checks nearby atoms for collision. 
As a result, each Gibbs sampling step can be per- 
formed in 0(1) time. 

In future work we are interested in including additional 
statistical potentials and/or all-atom energy function terms 
in scoring. With a naive implementation, each atom is 
involved in O(n) pairwise interactions, but we expect to 
exploit the weakness of distant interactions to reduce the 
number of factors included in the computation. 

Kinematic chain modeling 

Consider a jointed kinematic chain with reference frames 
T 0 , T lt T N , connected with relative rotational angles 
q lt q N . For a protein backbone, there is a one-to-one 
correspondence between frames and backbone atom posi- 
tions p u p M , and the rotational variables are simply the 
backbone dihedral angles (j>x, Wi> ■■■> 4>n/2> Wn/2- 

Although it is standard practice and beneficial for cer- 
tain algorithms to define the system state with a mini- 
mal set of coordinates, e.g., x = (T 0 , q lt q N ), a key 
step of our method is to consider an expanded state. 
Minimal coordinates use the fact that each subsequent 
frame 7\, T N can be determined from x through 
straightforward forward kinematics, leading to a lower 
dimensional representation. However, this approach 
eliminates sparsity in the probabilistic model because a 
factor defined on T N will depend on all variables, a fac- 
tor defined over T N . x will depend on all variables 
except q n , and so on. Moreover, if a sampler is asked to 
generate certain variables from a density defined over 
Ti, T N (for example, atom positions), the generated 
distribution may be biased unless it computes the deter- 
minant of an N x N metric tensor for each evaluation of 
O. As described below, this is a consequence of non- 
linear transformations of distributions (see Appendix). 
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On the other hand, computing determinants takes O 
(A/ 3 ) time, which scales poorly with large N. 

Our method represents an expanded state that incor- 
porates all spatial variables along with the conformation 
variables: x = (q v q N , T 0 , T N ). The joint probability 
density is then defined over angles and reference frames 
of all links along the chain (see Figure 1): 



$(x) = Y[ M S ') I"! <fainematic{Tj\Tj^, qj) 



(7) 



;'=i 



where each S t is now a subset of {q lt q n , T 0 , T n }, 
and where ^kinematic is the forward kinematic transform 
that defines the frame Tt in terms of the prior frame 
Tj-i and the relative angle qj. Because the transform is 
deterministic, (Pkinematic should be thought of as an indi- 
cator function. Taking the convention that each frame's 
origin lies on its joint's axis: 



(pkinematicWlj-l.qj) - \ Q Qtherwise 



(8) 



where Tj ei is the relative transformation of frame j 
relative to frame / - 1 and R{a, q) is the rotation of 
angle q about axis a. Fixed-endpoint constraints can 
also be encoded with indicator factors <p c / OSMre (r 0 ) and 
(Pciosure(T n ) that are zero everywhere except at the fixed 
frames. 

With (7) encoded so that factors contain few variables 
in their domain, the model becomes sparse. However, 
we have added the complication of maintaining a valid 
kinematic structure, because the set of x for which <D is 
nonzero lies on a lower-dimensional manifold. Techni- 
cally speaking, the probability density must be consid- 
ered with respect to a base measure that assigns finite, 



nonzero density to the manifold. For 3D chains, the 
state space has dimensionality 7N but the manifold has 
dimensionality 6 + N for free-endpoint chains or N - 6 
for fixed-endpoint chains. The next section will describe 
how we handle these submanifolds in detail. 

Block sampling and selection 

A block is a subset of variables that are simultaneously 
sampled. The number of variables in a block must be 
sufficiently large to give at least one continuous degree 
of freedom of movement. The Metropolis-Hastings cri- 
terion is used to accept or reject a move because it is 
unrealistic to sample directly from the block's condi- 
tional density. This key subroutine, Sample-Block-MH, 
takes as input the previous sample x^' and a block B of 
b consecutive joint angles and their intervening frames. 
It then samples a candidate move, and accepts it accord- 
ing to the M-H criterion. Pseudocode is as follows: 
Sample-Block-MH(x w , B): 

1. Using Sample-Block as described below, sample a 
candidate conformation x' B of B at random, keeping 
the rest of the chain x[? ' fixed. 

2. Compute the M-H acceptance probability 



a = min I 1 



t> B {x B )Q B {x's ) \x^) 



3. Accept the move x 



(fe+i) 



x' B with probability a. 



Here the subscript B denotes the subset of variables in 
the block, while the subscript C denotes the comple- 
ment of the block. The score <D B calculates the product 
of factors <p, whose domains 5, overlap with B, which is 
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more efficient than recomputing <D from scratch. The 
remaining details of the method - the block size, the 
block sampling procedure, and calculating the sampling 
probability Q B - are described in detail in the remainder 
of this section. To generate a new conformation x ( * +1) of 
the entire chain, Sample-Block-MH is called several 
times with overlapping blocks incremented sequentially 
down the chain. Block ordering (e.g. forward, backward, 
or random order) has no effect on the asymptotic distri- 
bution and experiments suggest virtually no noticeable 
effect apart from the first handful of samples. Thanks to 
sparsity, each pass is performed in O(N) time, which 
takes a fraction of a second for chains with hundreds of 
variables. 

How many variables should be included in a block? 
Standard Gibbs sampling (i.e., b = 1) does not work 
because loop closure constraints constrain the condi- 
tional density of any variable given the rest (5) to a 
Dirac. Hence, the state would never change. In fact, no 
mixing occurs for b < 5, except possibly at singular con- 
formations, which occupy a set of measure zero in con- 
formation space and are therefore unlikely to occur 
naturally. For 6 angles, analytical inverse kinematics (IK) 
techniques are available to compute solutions for a pair 
of fixed end frames [9]. In fact, any number from 0 to 
16 solutions may exist for a given 6-angle problem. 
Nevertheless, b = 6 is not suitable because it restricts 
the random walk to only a finite set of conformations. 

Setting b = 7 angles allows sufficient freedom to sample 
from a 1-dimensional manifold of solutions. In general, a 
block of b > 6 angles admits a b - 6 dimensional solution 
manifold. Denote the block B = {q h qi+b-i, T it T i+b _ 2 }, 
and let us call the first b - 6 angles of the block q b qub-7 
the independent subchain. Call the remaining 6 angles the 
dependent subchain. This is illustrated for a planar chain 
in Figure 2. A sampling procedure is as follows: 



Sample-Block 

1. Sample values for the independent subchain at 
random. 

2. Attempt to close the chain by calculating an ana- 
lytical IK solution for the dependent subchain. We 
use the method of [9]. 

3. If more than one IK solution exists, one is picked 
at random, and if no solution exists, the process ter- 
minates with failure. 

It is recommended that b > 7 be chosen as low as 
possible, because as b grows, the probability of sampling 
an independent subchain that admits closure drops off 
dramatically as b grows, particularly for "stretched out" 
conformations. In our implementation, 4 consecutive 
residues are considered as a block that contains b = 8 
angles since even numbers align better with the (cj), y/) 
angles priors of each residue (see Figure 3). (Throughout 
this discussion we have assumed a 3D chain but the 
method works equally well in 2D. For planar chains, at 
least 4 angles are needed, and the manifold of solutions 
is (b - 3) -dimensional) 

Calculation of sub-loop sampling densities 

To calculate the M-H importance ratio, we must calcu- 
late sampling density for the sampling procedure Sam- 
ple-Block. Several concepts from differential geometry 

are required in order to derive this density Q b (x'b|x[?)- 

Fix the endpoints of the block, and let M denote the 
(b - 6)-dimensional manifold of loop-closing conforma- 
tions. Let us call the (b - 6) angles of the independent 
subchain y, which are sampled w.r.t. the density P (y). 
Observe that the candidate sample x'g is distributed 
according to a nonlinear transformation of P(y) onto M. 
In fact, at non-singular conformations the independent 
subchain forms a local chart of M, which is a local 
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7-residue Chain 



Block 1 



Block 2 



Block 3 




Block 4 



Figure 3 Block selection. A 7-residue chain is shown with each residue drawn in a distinct color. SL1KMC incrementally samples block of 4 
consecutive residues (8 torsional angles) with the first 3 residues overlapping with the preceding block. 



bijection between R b - 5 to M centered at x' B (see Figure 4). 
Since there is a local bijection /between y and the point 
on the manifold x B , the sampling density over x B is given 
by: 



Q B (x B |x^) = 



Ply) 



sVdetG(y) 



(9) 



where 5 is the number of IK solutions at y and G is 
the metric tensor of the chart x B = J[y) (see Appendix). 
The inclusion of the metric tensor is a natural conse- 
quence of transformation of variables. For example, 
for the case b = 7, the metric tensor is the squared 
arc length of the 1-dimensional parametrization of M 
(Figure 4, bottom). In general, G is given by 



G(y) - (|(y) 



w 



(10) 



where g^(y) is the Jacobian of the function / Here we 
have also introduced a positive semidefinite weighting 
matrix W for the purpose of weighting the relative 
importance of matching the prior along certain axes. In 



the standard case, W is an identity matrix, but it can 
also be useful to choose a nonuniform diagonal matrix 
to account for heterogeneous units (e.g., angle vs. posi- 
tion variables). 

A remaining issue is that it is often difficult to expli- 
citly compute the Jacobian of the IK function involved 
in f. In other words, with z = z(y) denoting the 6 angles 
in the dependent chain, it is difficult to evaluate dz/dy. 
So, we compute an implicit chart Jacobian by consider- 
ing the implicit form of the constraints C(x B ) = 0. These 
vector-valued constraints state that the difference 
between the terminal frame of the subchain and the 
desired frame is zero. 

We have the constraint equation: 



0 = C(x B ) = C(y,z) 



(11) 



Taking the derivative of both sides of (11) with respect 
to y we get: 



_ 3C 3C3z 
9y 3z 3y 



(12) 
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J< Z 




*•...♦ s 



J< z 



df/dy 



Piy) 



x=(y,z(y))=f(y) 



A Q(f(y)) 




Figure 4 Sampling distributions on manifold charts. Top: abstract illustration of how analytical IK implicitly decomposes a 1 -parameter 
manifold M into a set of local bijections (charts). Bottom: the Jacobian of a chart must be taken into account when calculating the sampling 
distribution 0 over M. 



and hence 
9z _ 

9y = 



3C. V l dC 



(13) 



holds as lone as £ is invertible, which is true every- 
where except at singular conformations. Each derivative 
of C in the above expression is a submatrix of the Jaco- 
bian and can be computed using standard techniques. 

Finally, since 



f{j) T = [?,#,T i ,...T Mh . z ] 
we obtain the Jacobian 



df 
3y 



I, 



dz T dTi 
3y ' dy ' 



i+b-2 



dy 



(14) 



(15) 



in which / is the identity matrix and all frame deriva- 
tives are calculated using the chain rule -5 = 15 + ^P- I s . 

° ay dy dz dy 

These partial derivatives are calculated using standard 
techniques. 

Beyond computing the proper sampling density, it is 
also important to design the algorithm to efficiently 



compute the M-H acceptance probability. Since clash 
detection takes 60 times more computation time than 
calculating the rest of the terms in <J>, we check collisions 
after determining whether a move will be accepted. 
Compared to the naive method, this method achieves an 
order of magnitude speedup. 

Extension to other topologies 

Although the core method applies to linear closed kine- 
matic chains, it can be extended to handle other molecular 
topologies, such as free-endpoint chains and side-chains. 
In theory, polycyclic compounds may also be handled as 
well. Each new topological structure requires specialized 
block selection and sampling routines. For example, free- 
end-point chains need separate sampling subroutines for 
the start and end blocks. Standard MC methods are used 
to do so. 

Side-chain deformations are important for shaping 
binding cavities, and SLIKMC can be adapted to gener- 
ate side-chain conformations in the same graphical 
modeling framework. It is known that the side-chain 
conformation depends on the backbone dihedral angle 
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of the corresponding residue [18]. This requires sam- 
pling side-chains after the backbone conformation is 
sampled. Furthermore, since the distribution of side- 
chain torsional angles are limited to small number of 
typical conformations (rotamers) for most residues [19], 
we sample side-chains according to experimentally- 
determined distributions. 
Side-chain sampling 

For side-chain conformation priors we use the 2010 
Backbone-dependent Rotamer Library [20]. In this 
library, each rotameric residue is associated with a list 
of rotamers which representing the high probability 
regions for side-chain torsion angles. The probability of 
a rotamer conformation % is modeled as a continuous 
distribution given the backbone dihedral angle pairs. 
The dihedral angle (<f>, y/) space of each rotameric back- 
bone residue r is discretized into a grid and each cell 
[a, b] x [c, d] contains its experimentally observed prob- 
abilities P (x \ r, a < (f> < b, c < i// < d). Each distribu- 
tion over x is specified as a Gaussian mixture model. 
For non-rotameric residues the terminal % angles are 
handled specially due to the asymmetry in their 
distributions. 

Treating the remainder of the protein as fixed, we 
model the target distribution of a side-chain x s , condi- 
tional on backbone dihedral angles x b , as follows: 



The M-H importance ratio is 



<J> s (x s |x b ) = </>Sc(X5|x b )(/> J ;( r )(x s |x b ) 



(16) 



where (p sc indicates steric-clashes and (pu( r ) indicates 
the side-chain conformation prior for residue r. Side- 
chain B-factors are typically not included since we want 
to give enough freedom to explore the conformation 
space, and our experiments indicate that the flexibility 
of the protein chain will reduce greatly when we specify 
B-factors as prior to both backbone and side-chain 
atoms. 

Extending block sampling to include side-chains 
requires justifying the importance ratio carefully to 
ensure unbiased sampling. An efficient sampling proce- 
dure is as follows: first compute a closed-loop backbone 
subchain from the blocked Gibbs sampling step and 
compute its acceptance probability as usual. If accepted, 
sample each side chain along the block according to its 
backbone-dependent rotameric distribution. Because it 
is a Gaussian mixture, we can sample from <p# (r ) directly: 
pick a Gaussian from the mixture according to its 
weight and then sample from the Gaussian. Finally, 
reject the sample if the side chains collide. 

To justify this procedure, we show that its acceptance 
probability is equal to the M-H acceptance probability for 
the entire block including side-chains. Let the block be x B = 
{x b , x s ) and a candidate block sample x'g = (x'b,x' s ), with b, 
s denoting backbone and side-chain variables respectively. 



/(Xfc.X,) 



<t>{x' b ,x' s )Q{x b \x s ) = <t> h {x' b )<t> s {x' s \x' b )Q{x b )Q{x s \x b ) 
$(xbMQ{x'b,x' s ) *i,(xi,)* s (x s |x t )Q(x' i ,)Q(x' s |x' t ) 



(17) 



by conditioning on x b . Since we sample the side-chain 
according to Q(x s |x & ) = <p«( r )(x s | x^,) and the prior sam- 
ple is clash-free, we cancel terms in the numerator and 
denominator to get: 



f , , $b(xfc)Q(x b ) 
<t> b (x b )Q{x b ) 



(18) 



Since the first term is simply the importance ratio of 
the backbone and (psc is binary, we conclude that the 
block acceptance probability is either the backbone 
importance ratio if clash-free or zero if clashing. Hence 
the side-chain sampling procedure is sound. 
Multiply-closed kinematic loops 

It may be possible to extend SLIKMC to handle multiply- 
closed loops such as those that occur in polycyclic com- 
pounds. This requires special care to divide the structure 
into blocks that can be split into dependent and indepen- 
dent subchains, such that a conformation of the indepen- 
dent subset completely determines the dependent subset, 
up to some finite multiplicity. In other words, the indepen- 
dent subchains form a chart of the space of closed-chain 
conformations of the whole block. The union of all blocks 
must also cover all state variables. 

We illustrate the principle on planar kinematic chains, 
which require blocks of size at least 4. Assume each cycle 
contains at least 3 joints. We define a topological order- 
ing by selecting a linear main chain and considering 
branches off of the main chain. Non-branching linear 
blocks, free-endpoint blocks, and side-chains (open- 
ended branches) are handled as described above. Each 3- 
joint branch off of a branching block is then considered 
as part of a dependent subchain (see Figure 5). Branches 
off the dependent subchain are also added to the block in 
a recursive manner, leading to a block with a tree 
topology. 

To sample a branching block, we first sample values 
for the independent subchain at random and then close 
the loops for each branch according to their topological 
order. To ensure unbiased sampling, we must also cal- 
culate the metric tensor in (9) for the entire branched 
block. This in turn requires computing the Jacobian of 
the chart, which requires computing the Jacobian of the 
implicit form for the multiple loop-closure constraints 
(11). Due to the tree structure the Jacobian is sparse, 
and the matrix inversion in the implicit chart Jacobian 
(13) can also be computed efficiently. We have imple- 
mented this approach on 2D chains with closed rings 
(see Figure 6), and extending it to 3D chains remains a 
problem for future work. 
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Figure 5 Extending to non-linear topologies via branching blocks. Several branching structures may be treated as blocks. Independent 
chains (shaded) must be chosen to parameterize the manifold of configurations satisfying closed chain constraints (open circles). 







Figure 6 Results on a planar multi-loop structure. Fluctuations of a 2D chain with a closed ring constrained on the three ends (open circles). 
Left: initial conformation. The angular prior for each link is modeled as a normal distribution with 20° standard deviation. Right: 20 samples with 
skip length 100. 
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Mixing and autocorrelation 

In any MCMC method it is important to empirically 
examine the mixing rate of the Markov Chain. Firstly, it 
can potentially take many iterations to "forget" the effects 
of a poor initialization. For protein sampling, this is not a 
significant problem because we initialize the chain with 
the native structure in PDB, which is typically quite good. 

Secondly, subsequent samples are highly autocorrelated, 
and many conformations must be skipped to obtain a 
sequence with low autocorrelation. This is a serious con- 
cern because autocorrelation grows stronger as more vari- 
ables are included in the conformation (see Figure 7). In 
practice, one must determine the skip length empirically 
in order to obtain a quasi-independent sampling sequence, 
which is defined as a sequence with autocorrelation below 
some given threshold (0.2 is used in our experiments). 

Result and discussion 

The SLIKMC algorithm implements a scalable frame- 
work for Monte Carlo sampling of kinematic chains. 
The technique uses a blocked Gibbs sampler that pro- 
poses movements of small subchains of conformation 
angles at once, along with a Metropolis-Hastings techni- 
que that guarantees an unbiased sampling of the loop- 
closure submanifold for that block. Due to the small 
block size, each energy function is local and adjustments 
are fast, ranging from microseconds to milliseconds. 
The method is mathematically proven to generate a sta- 
tistically unbiased sample in the large sample limit. It is 
particularly well-suited for closed loops (see Figure 8) 
but can also be advantageous for chains with free end- 
points as well (see Figure 9). 

SLIKMC is implemented as an add-on to the software 
package LoopTK [21] [22] for protein loop sampling and 
is available at http://www.iu.edu/~motion/slikmc/. All 



experiments are run on a Intel i7 2.7 GHz computer with 
4 GB RAM. The library currently supports sampling with 
prior information from Ramachandran plots, steric 
clashes, and B-factors, and supports integration with the 
Backbone-Dependent Rotamer Library for side-chain 
sampling. Numerical experiments suggest that SLIKMC 
generates higher quality samples for large loops with 
lower computational cost than standard Monte Carlo 
techniques for open-ended chains and the RAMP loop 
completion package [23] . 

Loop sampling with prior distributions 

We consider the 10-residue closed loop 1AMP181-190, 
which is a representative segment for testing loop 
reconstruction algorithms [24]. SLIKMC is applied to 
sample 2000 conformations from a joint probability that 
includes steric clashes, Ramachandran plots, and B-fac- 
tors. The Ramachandran plot (see Figure 10) shows that 
the distribution of dihedral angles is contained within 
high probability regions but explores relatively widely, 
with an average angular deviation of 37°. 

We compared our method with the discrete-search 
loop construction software RAMP [8,23,25]. The latest 
version 0.7b was used in these experiments. We test the 
methods on loops of 1AMP with different lengths start- 
ing from residue 181. RAMP is tested by calling loop 
closure function with 0.5 A distance tolerance. SLIKMC 
samples from perturbed segments using Ramachandran 
plots as prior with clash-free constraint. Figure 11 plots 
the average time for both methods to obtain one closed 
conformation. Due to combinational explosion, the time 
required for RAMP increases exponentially and is 
impractical for >6 residues. 

We also compare SLIKMC with a sample-then-select 
inverse kinematics method that first samples a set of 




Figure 7 Mixing of SLIKMC samples. Sampling conformations of a planar 20-link chain, anchored at the endpoints, with a uniform prior. Left: 
starting from a deliberately bad initial conformation. Middle: the sequence mixes relatively quickly, but the first 40 samples are biased by the 
initial conformation and autocorrelate strongly. Right: a sequence that takes every 40'th sample does not significantly autocorrelate. 
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Figure 8 Closed-chain sampling. Three sampling methods for a 20-link closed-loop chain. At left, the prior gives preference to joint angles 

with small magnitude. At right, the prior gives preference to joint positions in a triangle shaped distribution (circle centers: means, shaded 

circles: 3o~ spreads). Top: sampling joint angles followed by numerical loop closure, best 20/20,000 samples. Middle: sampling with RLG [5], best 

20/20,000 samples. Bottom: SLIKMC, displayed every 40'th sample. These sample sets are generated by our method approximately as fast as RLG 

and an order of magnitude faster than numerical loop closure. 
* J 
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Figure 9 Free-endpoint chain sampling with heterogeneous priors. Comparing SLIKMC against a standard Metropolis-Hastings (M-H) 
sampler on a free-endpoint chain with heterogeneous prior distribution over joint positions (crosses: means, shaded circles: 3o~ spreads). For 
each method, 400 iterations are run and every 20'th sample is retained, taking approximately 10 s time on a standard PC. Left: M-H takes steps 
that are too large and only generates 4 unique samples. Middle: M-H with step size reduced by 10 has a higher success rate but slower 
convergence. Note the lack of variance in the leftmost point. Right: our method. 
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Figure 10 Ramachandran plot of SLIKMC samples. Left: the Ramachandran plot of generic residues from a database that includes 500 high- 
resolution proteins [26] used as a prior. Right: the Ramachandran plot for the generic residues in our 10-residue test protein (1AMP 181-190) 
generated from 2,000 consecutive samples. Each color represents one residue. (This figure is best viewed in color.) 
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Figure 1 1 Running time comparison between RAMP and SLIKMC. Time required for the discrete search method RAMP and SLIKMC to obtain one 
sample for loops of varying size. The time required for RAMP increases exponentially while our method runs in approximately constant time. 
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clash-free, loop-closing conformations and then extracts 
the top scoring ones. The LoopTK configuration sampling 
method [22] was used here. Given 300 s cutoff time, 
LoopTK generates 888 conformations, while our method 
generates 705. Figure 12 shows how the top 20 samples of 
LoopTK compare with every 100th sample of our method. 
The upper figures are generated using the original 
B-factors in the PDB file. SLIKMC matches the prior 
information more closely and obviates the need for post- 
processing using numerical optimization. The bottom fig- 
ures correspond to enlarged B-factors, which indicate less 
confidence in the values prescribed by the PDB file. The 
distribution of SLIKMC samples has approximately three 
times the variance of the original, which closely matches 
theoretical predictions. Figure 13 shows the distribution of 



RMSDs of C a atoms compared to the native structure for 
both LoopTK and SLIKMC with varying scales in B-fac- 
tors. This suggests that SLIKMC better supports variations 
in the experimenter's relative confidence in heterogeneous 
sources of information. 

Missing loop completion 

We now consider an application to completion of miss- 
ing loops. Given the starting position and ending position 
of a missing segment, we first generate an arbitrary loop- 
closing configuration, then run SLIKMC to perturb it to 
a high-probability conformation. As a test case, we select 
a helix structure (residue from 40-51) from an APO pro- 
tein 1B8C. We generate an arbitrary loop-closing config- 
uration by running the LoopTK configuration sampling 






Figure 12 Comparing SLIKMC against sample-then-select. Left: samples generated by SLIKMC with a skip length of 100. Right: samples 
generated by post-selecting the top 20 scoring samples generated from the LoopTK IK sampler. Transparent balls depict the 3o" spread of the 
atom position prior derived from its B-factor. The top row uses the original B-factors, while the bottom row enlarges B-factors by 10. 
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Figure 13 RMSD distributions from SLIKMC against IK. Histogram of RMSD to the native structure for samples from SLIKMC and the LoopTK 
sampler on 1AMP 181-190. With SLIKMC the use of prior information allows fine-grained control over the sampling distribution. 



method [22] to perturb the original conformation. Start- 
ing from the highly disordered conformation, SLIKMC 
is run for 2 minutes using priors including enlarged 
B-factors (scaled by a factor of 10) from original chain 
segment and Ramachandran plots with steric clash-free 
constraints. SLIKMC generates approximately 300 sam- 
ples within the time limit, and the closest sample to the 
original structure is with RMSD 0.2704 calculated from 
backbone atoms (see Figure 14). In contrast, the LoopTK 



configuration sampling method did poorly in construct- 
ing a favorable missing loop. 

Scalability tests on free-endpoint chains 

To further study scalability, we apply SLIKMC to sub- 
chains of chain A in a calcium-binding protein 1B8C. 
Samples for a 30-residue subchain are generated in 1 s 
(Figure 15) and samples for the entire 108-residue 1B8C 
protein are generated in approximately 4 s (Figure 16). 




Figure 14 Helix recovery. Left: Starting from a highly perturbed conformation, SLIKMC recovers a helix using only clash and Ramachandran 
plots information. Every 20 samples are drawn. The final displayed conformation has RMSD 0.2704 to the PDB structure. Right: by comparison, 
an IK technique attains a minimum RMSD of 4.0655 out of 1 3,000 samples (90 minutes running time). 
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Figure 16 Samples of a 108-residue chain. 17 samples of 1B8C chain A (108 residues) selected from 170 consecutive samples with skip length 
10. Each conformation is drawn in a distinct color. 



Zhang and Hauser BMC Structural Biology 2013, 13(Suppl 1):S9 
http://www.biomedcentral.com/1472-6807/13/S1/S9 



Page 1 7 of 20 



We compare SLIKMC against a standard Metropolis- 
Hastings algorithm that samples backbone angles accord- 
ing to a Gaussian proposal distribution with 1° standard 
deviation. The target distribution for both methods 
includes steric clashes, Ramachandran plots, and B-factors. 
Note that standard M-H has probability zero of sampling a 
conformation that satisfies terminal endpoint constraints 
exactly, and is not applicable to closed loops. So, these 
tests ignore the loop closure constraint altogether. 

Figure 17 displays the average time needed to obtain one 
quasi-independent sample over ten 30-minute runs for dif- 
ferent chain lengths. The skip lengths are determined 
empirically for each run. This data suggests that our 
method achieves a cost per quasi-independent sample that 
is nearly linear to the length of the chain. In contrast, the 
likelihood that standard M-H accepts a sample drops dra- 
matically as the number of residues increases, leading to 
exponentially growing cost per sample. 

Simultaneous backbone and side-chain sampling 

We demonstrate backbone and side-chain sampling 
using a 15-residue helix structure 1AMP 120-134. As 
priors we use backbone-dependent rotamer distributions, 
Ramachandran plot priors, B-factors for the backbone, 
and testing self-collision and collision against the non- 



loop portion of the chain. Given 20 min cutoff time, 
1,623 samples are generated. Figure 18 illustrates that in 
residue 130 (arginine), the distributions of torsional 
angles % 3 and Xi are limited due to steric clashes, while 
X\ and % 2 match well with the priors. 

Conclusion 

We propose SLIKMC - a Markov chain Monte Carlo 
method for sampling closed chains according to speci- 
fied probability distribution. A probabilistic graphical 
model (PGM) is proposed to specify the structure pre- 
ferences. A novel method for sampling sub-loops is devel- 
oped to generate statistically unbiased samples of 
probability densities restricted by loop-closure constraints 
and mathematical conditions necessary for unbiased sam- 
pling is derived. Simulation experiments show that 
SLIKMC completes large loops (>10 residues) orders of 
magnitude faster than standard Monte Carlo and discrete 
search techniques. 

SLIKMC is demonstrated to be applicable to various 
tasks such as conformation ensemble generation, missing 
structure construction. For future work we intend to 
integrate SLIKMC with more complex energy functions, 
statistical potentials, and machine-learning-based struc- 
tural function predictors. Another limitation of the 
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Figure 17 Running time comparison between SLIKMC and standard Metropolis-Hastings Time required to obtain one quasi-independent 
sample on open-ended subchains of 1B8C with a variety of lengths. Standard M-H did not generate even one sample for chain lengths above 
30 after 30 minutes. 
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Figure 18 Side-chain distribution of residue ARG. Left: Gaussian mixture distribution of side-chain torsion angles for the native structure of 
residue 130 (arginine) in protein 1AMP. Right: histograms of side-chain angles from samples generated by SLIKMC. The distributions of Xi, Xi 
match well with the sampling distributions while the distributions of Xi an d X$ are limited due to steric clashes. 



technique is that due to the locality of each block adjust- motion must cross low-scoring chasms in conformation 
ment, large-magnitude global motions may take a huge space. We intend to investigate annealing-like or random 
number of iterations to sample, particularly when the restart techniques for overcoming these difficulties, as 
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well as different block choices that allow the algorithm to 
take larger steps. Finally, we are interested in extending our 
method to study simultaneous backbone and side-chain 
flexibility in protein-ligand and protein-protein binding. 

Appendix 

This appendix presents a fundamental statement about 
probability densities under a transformation of variables. 

Suppose u e R m and veR" are multivariate random 
variables related by v = /(u), where f is differentiable and 
injective. Denote the image of A c R m as M = /(A) c W. If 
g u is a density with support over A, then the corresponding 
density over M, with respect to the m-volume measure, is 



&(v)=g u r 1 (v))/ A /detG[f-i(v)) 
where G(u) is the metric tensor: 

<*»-(&■>)'(&■>)■ 

More precisely, g v as defined above satisfies: 

/ gv(y)dfi = / g u (u)du 
Jf(u) Ju 



(19) 



(20) 



(21) 



for any subset U Q A, where dp is the m-volume ele- 
ment of M. 
From change of variables we have: 



/ g v (v)dn = f & (f(u))X(u)Ai 

Jf{U) Ju 



(22) 



where X(u) is the m-volume of the parallelotope 
spanned by the axes of the coordinate chart / centered 

We now use the fact that the m-volume V of the par- 
allelotope spanned by m vectors Vi, ...,v m e R is given 
by the determinant: 



V 2 = det 



/ v[vi v[v 2 

T T 
V^Vl vi,V 2 



\ v m v l v m v 2 



v|V m 



v m v m/ 



(23) 



Note that this can be expressed more compactly as det 
(A T A) where A is the matrix with v 1; v m as its col- 
umns. Hence, X(u) = -y/detG(u)- Finally, substituting g u 
in the r.h.s. of (22) gives the desired result. 
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